Multinomial Mixture Modelling for Bilingual Text Classification
نویسندگان
چکیده
Mixture modelling of class-conditional densities is a standard pattern classification technique. In text classification, the use of class-conditional multinomial mixtures can be seen as a generalisation of the Naive Bayes text classifier relaxing its (class-conditional feature) independence assumption. In this paper, we describe and compare several extensions of the class-conditional multinomial mixture-based text classifier for bilingual texts.
منابع مشابه
Model-Based Estimation of Word Saliency in Text
We investigate a generative latent variable model for modelbased word saliency estimation for text modelling and classification. The estimation algorithm derived is able to infer the saliency of words with respect to the mixture modelling objective. We demonstrate experimental results showing that common stop-words as well as other corpus-specific common words are automatically down-weighted an...
متن کاملBilingual Text Classification using the IBM 1 Translation Model
Abstract Manual categorisation of documents is a time-consuming task that has been significantly alleviated with the deployment of automatic and machine-aided text categorisation systems. However, the proliferation of multilingual documentation has become a common phenomenon in many international organisations, while most of the current systems has focused on the categorisation of monolingual t...
متن کاملLarge margin multinomial mixture model for text categorization
In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...
متن کاملClustering Images with Multinomial Mixture Models
In this paper, we propose a method for image clustering using multinomial mixture models. The mixture of multinomial distributions, often called multinomial mixture, is a probabilistic model mainly used for text mining. The effectiveness of multinomial distribution for text mining originates from the fact that words can be regarded as independently generated in the first approximation. In this ...
متن کاملA Generative Model for Self/Non-self Discrimination in Strings
A statistical generative model is presented as an alternative to negative selection in anomaly detection of string data. We extend the probabilistic approach to binary classification from fixed-length binary strings into variable-length strings from a finite symbol alphabet by fitting a mixture model of multinomial distributions for the frequency of adjacent symbols. Robust and localized change...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006